Wang-Landau algorithm for continuous models and joint density of states 
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We present modified Wang-Landau algorithm for models with continuous degrees of freedom. We 
demonstrate this algorithm with the calculation of the joint density of states g(M, E) of ferromagnet 
Heisenberg models. The joint density of states contains more information than the density of states 
of a single variable-energy, but is also much more time-consuming to calculate. We discuss the 
strategies to perform this calculation efficiently for models with several thousand degrees of freedom, 
much larger than other continuous models studied previously with the Wang-Landau algorithm. 
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The Wang-Landau (WL) algorithm Q has been ap- 
plied to a broad spectrum of interesting problems, in- 
cluding statistical physics models such as Ising and Potts 
models spin glasses 0, and stochastic series expan- 

sion of quantum systems jj|, as well as models in other 
areas such as complex fluids y| and protein molecules Q . 
These applications have been successful due to two fea- 
tures of this algorithm. First, the random walker of the 
WL algorithm is not trapped by local energy minima, 
which is very important for systems with complex en- 
ergy landscapes. Secondly, by calculating the density of 
states, thermodynamic observables in a wide range of 
temperature, including the free energy, can be calculated 
with one single simulation. The efficiency of the WL algo- 
rithm has been quantitatively studied, and improvements 
have been proposed by several authors 0, E El EH- 
which have made the WL algorithm a standard tool for 
discrete models. However, in some areas such as bio- 
physics, most models have continuous degrees of freedom; 
and naturally, the density of states of two or more vari- 
ables, in contrast to the energy alone, is of much interest. 
Efficient algorithms for those problems are called for. 

In this letter, we discuss two generalizations of the WL 
algorithm: (1) physical models with continuous degrees 
of freedom (referred to as off-lattice models in Ref. lajV, 
a nd ( 21 density of states of more than one variable ja, 
|jl. Ill), which we refer to as joint density of states. The 
joint density of states is more useful than the density of 
states. For example, the free energy as a function of both 
temperature and magnetic field can be calculated from 
g(M,E), the joint density of states of both energy and 
magnetization. For the protein system studied in Ref. , 
with g(£, E), where £ is the reaction coordinate, the free 
energy for each £ calculated with the EXEDOS algorithm 
can be obtained by integrating <?(£, E)e~ E l T over E. The 
calculation of joint density of states of continuous models 
clearly covers many interesting problems. 

However, these generalizations are not straight- 
forward, and there are certain intrinsic difficulties which 



FIG. 1: (a) \ng(M,E) of a three-dimensional Heisenberg fer- 
romagnet of size L = 10 with cutoff energy at -2.8, determined 
upto an additive constant, (b) The magnetization at different 
temperature and external field evaluated from g(M, E). 



frustrate the implementation of the simple version of the 
WL algorithm. First, the density of states g(E) of a 
continuous model goes to zero at its maximum and mini- 
mum energies, which become logarithmic singularities as 
the WL algorithm evaluates \ng(E). In fact, In g(M, E) 
is usually singular on its boundary. Secondly, suppose 
we calculate the joint density of a 32x32 Ising model on 
a square lattice, which was used as a test problem by 
Wang and Landau Q . If every value of the magnetiza- 
tion is counted, the joint density of states g(M, E) of both 
energy and magnetization will cost about 10 3 times the 
CPU time of g(E), because g(M,E) contains about 10 6 
integers. Hence such calculations have been restricted 
to systems of small sizes |ll| . One might use a binning 
scheme to deal with this problem, and may also hope 
that binning (discretization) provides a generalization of 
the WL algorithm to systems with continuous degrees 
of freedom. However, a simple binning scheme becomes 
either very costly or inaccurate even for relatively small 
systems. Shell et. al. used interpolation among bins 
to handle this difficulty. We have identified the origin 
of the inefficiency of the binning scheme and propose a 
better solution to this problem. 
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We first show the result of our calculation for a three- 
dimensional Heisenberg ferromagnet in Fig. ^ The 
Hamiltonian of this model is H = — X}<ji>Sj ' 
where the summation goes over nearest neighbors on a 
cubic lattice of size L with periodic boundary conditions. 
This model has a ferromagnetic phase transition and dis- 
plays global spin rotational symmetry. Here we define 
E = H/L 3 and M = L^^Sf, and g(M,E) is the 
joint density of states of M and E. Figure H^a) shows 
In g(M, E) for negative E. A region with d\ng/dM = 
below E = —1.2 indicates a transition to the ferromag- 
netic phase with a global rotational symmetry. The log- 
arithmic divergence in w{M, E) near the ground state is 
obvious. In the following, we use the Heisenberg ferro- 
magnet as a prototype model to identify several features 
and intrinsic difficulties of continuous models. 

In general, the model we study has many microscopic 
degrees of freedom s = (si, . . . , sat), which labels the mi- 
croscopic states in the phase space fl. The Hamiltonian 
H(s) is a real-valued function of the microscopic degrees 
of freedom, so is the order paramter, e.g. magnetization 
M(s) = N^ 1 Sf for the Heisenberg ferromagnet. The 
joint density of states is defined as 

g(M, E) = J Hid Si S(E - H(s))8(M - M(s)), (1) 

where |fl| is the volume of the phase space, and the inte- 
gral is replaced by a summation for discrete models. We 
refer to a pair (M, E) as a macroscopic state, and define 
the macroscopic phase space A = {(M, E)\ 3s, H(s) — 
E,M(s) = M}. Obviously g(M, E) > is a probability 
measure of A induced by a priori probability measure of 
f2. Usually this measure is almost a <5-function in A, dom- 
inated by overwhelmingly many states in the high tem- 
perature limit. Due to the Boltzman distribution, for 
finite temperature properties, we are mostly interested 
in the very small values of g(M, E). In the following, we 
denote (M,E) collectively with x. (In case of g(E), the 
macroscopic state x is specified by the energy alone.) If 
one tries to evaluate g(x) by sampling f2 uniformly, then 
there is virtually no chance to sample a finite tempera- 
ture macroscopic state. The WL algorithm learns g(x) in 
the runtime with a simple strategy. In a sentence, when a 
trial move Xi — > Xf is suggested, it is accepted with prob- 
ability Ai^f = min{l, exp[w(xi)—w(xf)]}, where w(x) is 
an approximation to In (7(2;), and if the move is accepted 
w(xf) is updated with 7 + w(xf) otherwise w(xi) is up- 
dated with 7 + u>(xi), where 7 = In/, and / is referred to 
as the modification factor. Previous studies 0,0 showed 
that for a discrete set of x, w(x) quickly converges to 
lng(x) within some statistical error proportional to x/7 
for 7^0. For a continuous system, one simply cannot 
update g(x) point by point. Some modification of the up- 
date scheme is required along with a method to represent 
the continuous function w(x) in the computer memory. 
The simple binning scheme approximates g{x) with a 



piece-wise constant function, which has discontinuities 
on the boundary of the bins. This scheme works if g(x) 
varies very slowly in each bin, otherwise as the trial moves 
within each bin is not biased, the maximum point of 
g{x) in each bin is much more likely to be sampled than 
other points, which actually violates the spirit of the WL 
algorithm- to sample every macroscopic state uniformly. 
Reducing the size of the bins could overcome this diffi- 
culty for small systems, but generally for the joint density 
of states this would result in an excessively large number 
of bins to sample. Bilinear interpolation among neigh- 
boring bins was used by Shell et. al. to prevent the 
random walker from being trapped in one bin. However, 
to capture the curvature of a 2D surface, a large number 
of bins are still necessary. We will give an estimation of 
this number in later discussions. Alternatively, we have 
adopted an update scheme similar to the kernel density 
estimation 12j(KDE), which is consistent with the con- 
tinuous nature of g{x). In our simulations of joint density 
of states, we select a localized positive continuous kernel 
function k(x) > 0, and scale it by two constants 7 and 
6: k(x) — > jk(x/5). If the random walker arrives in Xq, 
then the continuous histogram w(x) is updated by 



w(x) — > w(x) + jk({x — xq)/S), 
and we express the acceptance rate as 



iin< 1, ( 



exp \na[w(xi) — w(x f)} 



(2) 



(3) 



where we have inserted a constant In a so that w(x) con- 
verges to logo, g{x). The continuity of w(x) ensures that 
the random walker is always properly biased. This up- 
date scheme is similar to the method of Ref. 0] , where 
the potential is updated by Gaussian kernels in a molec- 
ular dynamics simulation. In addition to the Gaussian 
kernel function k(x) = exp(— |x| 2 ), we also implemented 
the Epanechnikov kernel k(x) = [1 — |x| 2 ] + . Little dif- 
ference has been seen in the calculations with different 
kernel functions. By slightly modifying the approach in 
Ref. 0|, we can prove that this update scheme converges. 

Thus the single modification factor in the original WL 
algorithm is replaced by a triplet (a, 7, 6). In our simula- 
tions, we have used numbers between 0.0001 and 0.01 for 
7, and S corresponding to about 1/200 of the width of the 
energy window or the magnetization window. Unlike the 
original WL algorithm, we do not reduce 7 to extremely 
small values in the simulation, neither do we change 5 in 
the simulation. We have the freedom to change a pro- 
vided that w(x) is properly rescaled. A small a actually 
does not improve the accuracy of the calculation. 

From Ref. we know if we start from an unbiased 
initial Wq(x) — 0, wt(x) (where T labels the number of 
Monte Carlo steps) grows from the region of large g(x), 
and extends to unexplored region of small g(x). wt{x) 
can be written as: wt{x) — [C(T) + \og a g(x)+rT(x)] , 
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where C (T) is an increasing function of T only, rr (x) is 
a bounded error term controlled by the triplet (a, 7,(5). 

(x) also increases monoticically in the simulation, and 
remains zero in the unexplored region. One cannot ex- 
pect it to be flat as in the original WL algorithm for 
discrete models. The simulation should be stopped by 
other criteria, e.g. the visited area reaches a low energy 
cut-off. Then we continue the simulation with reduced 
7 to improve the accuracy in the visited area. If the re- 
sult is accurate, wt{x) increases uniformly in the visited 
area. T is estimated by counting the number of kernel 
functions to build up wt{x): 



T = 



r )k(x/5)dx 



wt{x)<Ix. 



(4) 



However we find that the initial accumulation in which 
At = supp{wt(x)} expands to A takes a very long time 
for joint density of states of two variables. The expan- 
sion of At slows down as the area of At increases. The 
reasons are two-fold. First, the simulation samples the 
macroscopic states within At uniformly, giving rise to a 
uniform growth there. This uniform growth takes a con- 
siderable CPU time, while only about a fraction |At| -1 ^ 2 
of Monte Carlo steps on the boundary of At happen to 
extend the simulation to the unexplored area. Secondly, 
close to the singular boundaries of A, Vlog Q g(x) be- 
comes very large, requiring a very small S (high resolution 
in the kernel function) to capture the large derivative. 

To avoid repeated sampling of the visited region At, 
and to push the simulation to the unexplored region, we 
find it is most efficient to introduce a global update of 
wt{x): when wt(x) is a good estimate of log Q g(x) inside 
At, we update wt(x) with the following formula: 



wt(x) — > wt (x) + k exp 



-A 



Wt{x) — LO 



e(w T (x)-io), (5) 



where O is the Heaviside step function. Basically, wt(x) 
is shifted up by an amount of k where wt{x) > to, and the 
exponential function removes the resultant discontinuity. 
Here to is required to be positive because the above local 
update scheme requires a minimum number of visits to 
give a correct estimation of the density of states. 

As a result of this global update, the random walker 
is confined close to <9At, only when the accumulation on 
the boundary exceeds k, is the random walker likely to 
come back to the interior of At- Then a short global 
sampling using Eq. J2J) covers up possible artifacts that 
result from the global update. Figure [21 illustrates one 
cycle of calculation with the global update. Ideally, the 
initial accumulation is decomposed into a number of cy- 
cles. In each cycle, the simulation works on a partic- 
ular subset of A. Let A = 0, Ao = max{log Q g(x)}, 
A n = {x| \og a g(x) > Ao — tik} for n > 0. The series of 
sets {A„} converges to A. In n th cycle of our calcula- 
tion with the global update Eq. JSJ, the random walker 
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FIG. 2: A cycle of the simulation with global update, (a) 
original wt(x), (b) wt(x) after the global update, (c) the 
increment accumulated beginning with the wt(x) in (b), (d) 
sum of (b) and (c) gives a new wt(x) in the end of this cycle. 

estimates g[x) in A n — A n _i, which roughly requires 



r )k{x/5)dx 



-A„_ 



"logaSO) - K]d.X 



Monte Carlo steps. By comparison with Eq. QJ, we see 
that instead of filling up the bulk of wt(x), the simula- 
tion only fills up a thin surface layer of wt{x) of thickness 
roughly given by k. Consequently, the algorithm with 
the global update saves a huge amount of Monte Carlo 
steps. Compared with distributing the random walkers 
to a number of "windows" 0, H, El, the global update 
has the advantage of automatically selecting the "win- 
dows" on the frontier of the simulation and avoiding the 
boundary errors Figure |3| shows that the global up- 
date saves 90% of the CPU time in a typical simulation, 
where we plot t as a function of maximum histogram 
W fts w(0, 0). Without the global update, since the incre- 
ment in W is due to the uniform accumulation of samples 
inside the visited area At, therefore dt/dW oc | At | . From 
Fig. ^ we can tell that At mainly grows towards lower 
energy. Hence we expect approximately |At| ~ aW + b, 
which results in a leading quadratic dependence t tx W 2 
in Fig. |3| With the global update, both the prefactor 
and the exponent of this relation are reduced. The short 
global samplings at the end of each cycle of the global 
update result in T oc W A , with A w 1.55 in Fig. 

We summarize the algorithm as the following: A calcu- 
lation from scratch is divided into an initial accumulation 
stage and a refining stage. In the initial accumulation 
stage, (1) we start the calculation with w(x) = 0, us- 
ing local updates Eq. (0). (2) As soon as w(x) > uj for 
some x, we apply the global update Eq. ifHjl. and continue 
the accumulation with local updates. wt(x) initially in- 
creases on the boundary of At- (3) After it resumes a 
uniform growth over At, we start the next cycle with 
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FIG. 3: Comparison of CPU time used by the calculations 
of g(M,E) with and without the global update for an L — 
5 Heisenberg model of ferromagnet, u) — 0.5 is used. The 
dashed line is a guide for the eye. 



another global update. The refining stage starts when 
wt{x) expands to the entire A or certain cut-off one as- 
sumes. Then we reduce 7 (and 5 if necessary) to continue 
the simulation with local updates only. When wt(x) re- 
sumes a uniform growth, we can further refine the result 
with reduced 7 or continue with the current parameters 
to decorrelate the statistical error and average over mul- 
tiple uncorrelated results of wt (x) . 

We calculate the thermodynamic quantities from 
g(M, E) with numerical integral . Fig. [Jb) shows the 
magnetization of Heisenberg ferromagnet as a function 
of external field and temperature. The specific heat in 
Fig. Q] shows a typical peak at T c of Heisenberg models. 
We also compare our results with that calculated with the 
original WL algorithm in Fig.|3| which evaluates g{E) on 
a grid of 3000 bins. They only differ slightly at low tem- 
peratures where both results show small errors. This er- 
ror comes from the binning or interpolation scheme used 
to represent the continuous g(E) or g(M, E) (not from 
the numerical integration) . Actually, given that the stan- 
dard deviation of the canonical distribution of the energy 
is a E = ^/THTJW w 0.036 for L = 10, T = 1, and 
e ps 0.1 for L — 5, the numerical integration requires 
enough data points within ge to be accurate. This crite- 
rion applies to both our kernel function updates and the 
bilinear interpolation scheme Q. A larger erg explains 
why the error is very small for L = 5 in Fig. In case 
of L = 10, the internal array we used to store g(M, E) 
has an energy resolution of 0.0012, which is comparable 
to the bin size (0.001) of the original WL algorithm we 
used for g(E). Consequently, they show errors of compa- 
rable sizes. One can conclude that for larger systems, the 
resolution in each macroscopic quantity is required to in- 
crease as ^/N to maintain the accuracy in the numerical 
integral, where N is the number of degrees of freedom. 

In summary, we have extended the WL algorithm 
to treate continuous systems and their joint density of 
states, proposed kernel function local updates and a 
global update to increase the efficiency of the algorithm. 



2.5 




FIG. 4: Specific heat of the Heisenberg ferromagnet of size 
L — 10, and 5, with comparison to the results from the origi- 
nal WL algorithm performed with a large number of bins. 

Our new strategies have potential applications to many 
complex systems with thousands of degrees of freedom. 
In particular, ther kernel function update benifits from 
the continuity of the model; and the global update ef- 
fectively drives the random walker to unexplored area, 
so that extreme values of macroscopic variables can be 
searched. If the calculations of g(M, E) we presented 
were performed with the original WL algorithm, the CPU 
time used would have increased roughly by a factor of ten. 

We thank D. Sanders for his helpful comments. This 
research is supported by the Department of Energy 
through the Laboratory Technology Research Program of 
OASCR and the Computational Materials Science Net- 
work of BES under Contract No. DE-AC05-00OR22725 
with UT-Battelle LLC, and also by NSF DMR-0341874. 



[1] F. Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 

(2001) ; Phys. Rev. E 64, 056101 (2001); Comput. Phys. 
Commun. 147, 674 (2002). 

[2] C. Yamaguchi, Y. Okabe, J. Phys. A 34, 8781 (2001). 
[3] Y. Okabe, Y. Tomita, and C. Yamaguchi, Comput. Phys. 

Commun. 146, 63 (2002). 
[4] M. Troyer, S. Wessel, and F. Alet, Phys. Rev. Lett. 90, 

120201 (2003). 

[5] M. S. Shell, P. G. Debenedetti, and A. Z. Panagiotopou- 

los, Phys. Rev. E 66,056703 (2002). 
[6] N. Rathore and J. J. de Pablo, J. Chem. Phys. 116, 7225 

(2002) ; N. Rathore, et. al, ibid. 118, 4285 (2003); 120, 
5781 (2004). 

[7] P. Dayal, et. al, Phys. Rev. Lett. 92, 097201 (2004). 
[8] C. Zhou and R. N. Bhatt, cond-mat/0306711 
[9] B. J. Schulz, K. Binder, M. Miiller, and D. P. Landau, 
Phys. Rev. E 67, 067102 (2003). 
[10] A. Malakis, A. Peratzakis, and N. G. Fytas, Phys. Rev. 

E 70, 066128 (2004). 
[11] D. P. Landau, Shan-Ho Tsai, and M. Exler, Am. J. Phys. 

72, 1294 (2004). 
[12] W. Hardle, M. Muller, S. Sperlich, and A. Werwatz, Non- 
parametric and Semiparametric Models (Springer, 2004). 
[13] Y. D. Wu, J. D. Schmitt, and R. Car, J. Chem. Phys. 
121, 1193 (2004). 



